Back

Molecular Biology and Evolution

Oxford University Press (OUP)

Preprints posted in the last 30 days, ranked by how well they match Molecular Biology and Evolution's content profile, based on 488 papers previously published here. The average preprint has a 0.21% match score for this journal, so anything above that is already an above-average fit.

1
Modeling Site-Specific Mutation Patterns in Pandemic-Scale Phylogenetics

Martin, S.; Ly-Trong, N.; Minh, B. Q.; Goldman, N.; De Maio, N.

2026-05-04 evolutionary biology 10.64898/2026.04.30.721865 medRxiv
Top 0.1%
42.1%
Show abstract

Models of genome evolution often account for different evolutionary rates at different genome positions due to, e.g., varying selective pressures or mutation rates. Recent evidence from millions of publicly shared SARS-CoV-2 genomes has revealed a more complex mutational landscape than can be modeled with existing approaches. Here, mutation rates are in fact not only highly position-specific, as currently modeled, but also nucleotide-specific; for example, specific mutations can occur very often at certain determined genome positions, while at the same positions other mutations might not be highly recurrent. Here, we propose and investigate a general model of genome evolution where each genome position is allowed to evolve under an independent, non-normalized substitution rate matrix describing site-specific rates of all mutation types ("Site-Specific Matrix" model, or SSM). We implement SSM in the efficient pandemic-scale phylogenetic inference software CMAPLE. Large-scale genomic epidemiological simulations suggest that, given enough data, SSM can accurately infer position- and nucleotide-specific substitution rates for more frequently observed nucleotides (typically the reference nucleotide), while other rates require higher levels of divergence. Simulations also show that SSM has a modest impact on the accuracy of phylogenetic tree estimation. We use SSM to analyze the evolution of millions of SARS-CoV-2 genomes and observe substantial mismatches between the substitution rates of classical rate variation models and our SSM estimates. These results suggest that classical models of rate variation are inadequate for modeling site-specific mutation patterns and that SSM is a useful alternative for large-scale genome analyses.

2
Evolutionary rate correlations reveal long-term co-evolutionary interactions in Drosophila melanogaster

Dagilis, A. J.; DiAngelis, B.; Lee, S.; Matute, D. R.

2026-05-23 evolutionary biology 10.64898/2026.05.21.726714 medRxiv
Top 0.1%
32.3%
Show abstract

Co-evolution between genes can occur for a variety of reasons, including co-expression of genes, epistatic interactions between them, physical interactions of gene products and many others. Co-evolutionary partners of a gene are therefore of great interest in identifying potential factors that contribute to any phenotype of interest. State-of-the-art approaches to detect these interactions use correlations of evolutionary rates across a broader phylogeny, and so by necessity identify interactions only among genes that are present across long evolutionary time periods. This makes the methods unwieldy when interest lies in a single focal organism in which the genes of interest may have evolved in the recent evolutionary past. Here, we present a new approach to calculating evolutionary rate correlations which focuses on extracting maximum coverage for a single focal species, while retaining signals of co-evolution across large clades. We show how this approach is able to identify potential interactions even in highly studied species and highly studied genes, with a focus on the D. melanogaster sex-determiner, Sxl, using data from 72 species of Dipterans.

3
Evolution of regulatory networks controlling plasticity in gene expression between Saccharomyces cerevisiae and Saccharomyces paradoxus

Redhuis, A. C.; Wittkopp, P. J.

2026-05-20 evolutionary biology 10.64898/2026.05.18.725926 medRxiv
Top 0.1%
32.2%
Show abstract

Organisms cope with environmental changes by modifying gene expression. To understand how regulatory networks controlling expression plasticity evolve, we analyzed RNAseq data from Saccharomyces cerevisiae, Saccharomyces paradoxus, and their F1 hybrids at multiple timepoints after transferring cells from standard laboratory conditions to five environments (low phosphorus, low nitrogen, hydroxyurea shock, heat stress, and cold stress) and during the diauxic shift. In each of the six datasets, we identified genes that changed expression following the transition to the new environment and used hierarchical clustering to identify genes that increased or decreased in expression. We then compared these classifications between orthologs to identify genes with divergent plasticity. For some genes, plasticity was more extreme in one species than the other, and for others, expression of orthologs changed in opposite directions when acclimating to the same environment. Most cases of plasticity divergence were seen only in one environment and were attributable primarily to trans-regulatory divergence. Using environment-specific regulatory networks inferred from data in Yeastract, we found that divergent plasticity of environment-specific transcription factors generally did not predict divergent plasticity of their target genes. We also found that, as a group, genes with conserved plasticity tended to have more regulatory interactions than genes with divergent plasticity. Interesting patterns of expression divergence were also observed for five transcription factors in the pleiotropic drug resistance network and their target genes that might contribute to phenotypic divergence. Together, these findings show how environment-specific trans-regulatory divergence and combinatorial gene regulation shape the evolution of expression plasticity.

4
The dynamics of silent variation in Mimulus guttatus: Codon usage bias and linked selection

Madrigal Roca, L. J.; Kelly, J. K.

2026-05-20 evolutionary biology 10.64898/2026.05.18.725996 medRxiv
Top 0.1%
24.9%
Show abstract

O_LISynonymous nucleotide variation, which is remarkably high in Mimulus guttatus, can be affected by both codon usage selection (translational efficiency) and linked selection (hitchhiking effects). C_LIO_LICodon usage reflects a genome-wide tug-of-war between mutational pressure toward A/T-ending codons and weak selection favoring G/C-ending codons. The outcome is determined largely by gene expression level and localized variation in recombination rate. C_LIO_LIUsing both mechanistic (ROC-SEMPPR) and population genetic models, we find that most genes are weakly selected for codon usage, about 76% yielding scaled selection coefficients (S = 4Nes) in the range of 0 to 1. Additionally, 4029 genes, primarily involved in photosynthesis, translation, defense, and phosphate scavenging, experience strong selection (S > 1). C_LIO_LILevels of nucleotide variation within genes indicate a strong effect of linked selection. Non-synonymous polymorphism declines in genes with strong purifying selection, and as the rate of (intra-genic) recombination declines. Levels of synonymous polymorphism usually track non-synonymous (owing to background selection), except in genes under the strongest translational selection. C_LIO_LICounterintuitively, we find that codon usage selection has a generally positive effect on synonymous nucleotide diversity at 4-fold degenerate positions. Since mutation strongly disfavors the optimal base in M. guttatus, codon selection in the range of 0 < S < 2 evens the balance (between selection and mutation) and thus inflates heterozygosity. C_LI

5
Multiple molecular and cellular properties jointly affect protein and site-specific evolutionary rates

Saini, A.; Usmanova, D. R.; Supo Escalante, R.; Vitkup, D.

2026-05-23 evolutionary biology 10.64898/2026.05.20.726710 medRxiv
Top 0.1%
22.7%
Show abstract

Protein evolutionary rates vary widely across proteins and among sites within proteins, reflecting multiple molecular, cellular, and functional constraints. While protein-level properties, such as expression and essentiality, and site-level structural and functional constraints, are known to influence evolutionary rates, how these constraints combine across scales to determine site-specific evolutionary rates remains unclear. Moreover, because many protein features are strongly correlated, it is difficult to disentangle their individual contributions to evolutionary rate variance, and unified predictive models that integrate these properties are still lacking. Here, we use neural networks to predict protein evolutionary rates across multiple scales based on multiple molecular and cellular features. At the protein level, integrating molecular and cellular descriptors explains substantial variance in evolutionary rates across proteins in multiple eukaryotic species, including nearly 50% of the variance in humans and substantial fractions of the variance in other eukaryotic species. The model also allows us to identify proteins whose evolutionary rates deviate from expectations based on their molecular and cellular properties. At the site level, we found that structural and functional features explain a comparable fraction of the variance in relative evolutionary rates. By integrating protein-level and site-level predictors, the model explains up to 37% of the variance in site-specific evolutionary rates across proteins. Our analysis demonstrates that constraints at these two scales combine largely additively, with protein-level properties setting the overall evolutionary context and site-level properties shaping variation within proteins. Together, these results provide a quantitative framework for understanding protein evolution across biological scales.

6
Interpreting GC content differences across populations at polymorphic sites

Chandra, S.; Gao, Z.

2026-05-18 evolutionary biology 10.64898/2026.05.16.725686 medRxiv
Top 0.1%
21.9%
Show abstract

Recent studies have reported consistent inter-population differences in GC content at polymorphic sites in multiple species, including humans. Specifically, populations that experienced recent bottlenecks exhibit lower average GC content (GC%) at common polymorphic sites compared to non-bottlenecked groups--an observation previously interpreted as indication of rapid evolution of base composition. In this study, we investigate the evolutionary and technical factors driving these patterns across humans, mice, maize, and silkworm. We find that GC% at polymorphic sites is highly sensitive to the allele frequency threshold applied. Relaxing this threshold reduces inter-population differences to negligible levels in humans and significantly attenuates similar signals in other species. We further observe substantial GC% variation across allele frequency bins, a pattern driven by the differential abundance of different mutation types. We demonstrate that these observations are collectively driven by an interaction between demographic history and a universal excess of strong-to-weak mutations relative to weak-to-strong mutations, which is counteracted by GC-biased gene conversion (gBGC) over long evolutionary timescales. Forward-in-time simulations with realistic parameters recapitulate observed patterns of GC% variation across both populations and allele frequency bins. Overall, our findings reveal that the base composition at polymorphic sites is strongly shaped by the interaction between demographic history, mutation bias, and gBGC, and does not represent stable, genome-wide trends. Consequently, inter-population differences in GC content--especially at common variants--should not be interpreted as evidence of ongoing divergence in base composition or shifts in mutation patterns.

7
Convergent gene erosion in the chemical defensome of marine mammals

Danneels, B.; Oliveira, D. O.; Castro, F. L. C.; Karlsen, O. A.; Ruivo, R.; Goksoyr, A.

2026-05-23 genomics 10.64898/2026.05.21.726804 medRxiv
Top 0.1%
21.9%
Show abstract

To preserve homeostasis in the face of continual chemical insult, animals evolved dedicated molecular systems that detect, detoxify, and eliminate foreign compounds. Collectively, these enzymes, transporters, and regulatory pathways constitute the chemical defensome. In cetaceans, the loss of two key nuclear receptors (NR1I2/PXR and NR1I3/CAR) suggests a profound rearrangement of the chemical defense systems. Therefore, we investigated the gene inventory of the chemical defensome in Cetacea and two other major marine mammal lineages (Pinnipedia and Sirenia), using their closest terrestrial relatives to understand the extent and patterns of chemical defensome remodelling. We demonstrate large-scale gene loss in chemical defensome genes of cetaceans, as well as smaller scale gene loss in the other two marine mammal lineages, indicating possible convergent evolution. Gene loss occurred predominantly in phase I and phase II biotransformation enzymes, including CYPs, FMOs, SULTs, and GSTs. Many of the lost genes in cetaceans are known to be regulated by PXR and/or CAR, while genes lost in multiple marine mammal lineages are often not regulated by these transcription factors. We hypothesize that the transition to aquatic environments, often accompanied by corresponding changes in feeding habits, led to convergent loss of chemical defensome genes, and loss of PXR and CAR in cetaceans accelerated these losses. These findings reveal systematic erosion of chemical defense capabilities across marine mammal lineages, suggesting that adaptation to marine life involves trade-offs in detoxification capacity that may have significant implications for these species responses to increasing chemical pollution in present-day ocean environments.

8
Promises and limitations of local ancestry inference in imputed ancient genomes

Bougiouri, K.; Irving-Pease, E. K.; Frantz, L. A. F.; Racimo, F.; Petr, M.

2026-05-20 evolutionary biology 10.64898/2026.05.19.725905 medRxiv
Top 0.2%
14.3%
Show abstract

Recent advances in genome imputation have enabled the application of state-of-the-art statistical methods--originally developed for present-day genomes--to ancient genomes. One class of such methods, known as local ancestry inference (LAI), can model an individuals genome as a mosaic of tracts assigned to different putative ancestral sources, revealing patterns of genetic ancestry across the genome. However, most LAI methods have been designed to study recent admixture events in human history, and they generally assume large panels of present-day genomes. Despite the recent availability of high-quality imputed ancient genomes, it remains unknown to what degree LAI inference is reliable for such datasets. Ancient DNA is often characterized by heterogeneous geographic and temporal sampling, varying degrees of divergence between ancient source proxies and admixing populations, and complex demographic histories. Here, we performed an extensive set of population genetic simulations to evaluate the accuracy of four popular LAI methods-RFMix, FLARE, MOSAIC and simpLAI-under different demographic scenarios, various temporal sampling schemes, sample sizes, and admixture dates. We quantify the accuracy of these methods as a function of different parameters in practically relevant scenarios, and provide general guidelines for future studies utilizing LAI in ancient DNA research.

9
Phylogenetically estimated neutral rates and fitness effects of mutations to influenza proteins

Haddox, H. K.; Hinrichs, A. S.; Jennings-Shaffer, C.; Johnson, K.; Benton, C. T.; Galloway, J. G.; Bloom, J. D.; Matsen, F. A.

2026-05-20 bioinformatics 10.64898/2026.05.18.725477 medRxiv
Top 0.2%
13.9%
Show abstract

Influenza viruss rapid evolution is shaped by both neutral mutation and selection. Phylogenetics can be used to study these processes, but this approach has typically only been applied to a few thousand influenza genome sequences at once. Here, we built phylogenetic trees with >100,000 influenza sequences, and then used these trees to estimate neutral rates of mutations to the viruss genome. Neutral rates varied by up to ~100-fold among the 12 nucleotide mutation types (A[-&gt;]C,A[-&gt;]G, etc.). These rates were highly correlated among influenza, SARS-CoV-2, and HIV, though more nuanced context-dependent patterns showed marked differences between influenza and SARS-CoV-2. We also estimated fitness effects of mutations by comparing the number of times a mutation was observed to occur along the branches of a tree to the number of times we expect it to have occurred under neutrality. We estimated effects for ~33,000 nonsynonymous and ~8,000 synonymous mutations spanning all influenza proteins. This compendium of estimated effects helps map the relationship between sequence and fitness in a natural setting, including regions where synonymous mutations are under functional constraint, and for proteins with limited experimentally measured effects. We built interactive heatmaps of the estimated fitness effects to help readers explore these data (see https://matsen.group/flu-mut-rates). Altogether, this work places influenzas mutation rates in a broader cross-viral context and deepens our understanding of how mutation and selection shape influenza evolution in nature at a site-specific level.

10
Substitution rate variation, not hidden paralogy, drives false hybridization signal in phylogenetic network inference

Li, B.; Ane, C.

2026-05-18 evolutionary biology 10.64898/2026.05.11.723986 medRxiv
Top 0.3%
12.6%
Show abstract

Phylogenetic network inference methods are increasingly used to detect hybridization and gene flow from genomic data, but their robustness to common sources of model violation remains poorly characterized. We conducted a simulation study to evaluate the effects of hidden paralogy and substitution rate variation on two widely used network inference methods: find_graphs from ADMIXTOOLS 2 and SNaQ. Using an eight-taxon species tree calibrated from an empirical reptile phylogeny, we simulated data under various levels of hidden paralogy (from none to strong) and three levels of rate variation (none, gene-specific, and lineage-specific). We found that hidden paralogy had limited impact on network inference under the conditions examined: both network methods correctly favored a tree without reticulation, and ASTRAL recovered the correct species tree every time. In contrast, lineage-specific rates severely biased find_graphs, inflating worst f-statistic residuals well beyond the standard acceptance threshold. SNaQ correctly selected a tree model almost always across all conditions, though its network with h = 1 reticulation displayed the true species tree with a lower probability under lineage-specific rates. We also show that the standard worst residuals threshold of 3 for find_graphs produces inflated type I error even without rate variation, and we recommend empirical calibration of this threshold within each study system.

11
A Rarefaction Approach to Identify Local Introgression in a Three Population Tree

Smith, T. Q.; Szpiech, Z. A.

2026-05-16 evolutionary biology 10.64898/2026.05.13.724952 medRxiv
Top 0.3%
12.5%
Show abstract

Pattersons D statistic, also known as the ABBA-BABA statistic, is widely used to detect the presence of archaic genome-wide introgression between two non-sister taxa. Requiring only a single lineage from each of four taxa where one taxon acts as an outgroup to determine the ancestral allele, Pattersons D, counts the imbalance between the number of biallelic sites where either the second and third taxa (ABAB site) or the first and third taxa (BABA site). When there is no introgression, these counts are expected to be equal, and a discordance between counts suggests introgression from the third taxon into either the first or second. Pattersons D is limited to the detection of genome-wide introgression and exhibits a high false-positive rate when applied to smaller genomic segments. Here, we present a new method, D STatistic with Allelic Rarefaction (D*), to address these limitations. D* uses multiple lineages and does not require an outgroup to calculate the imbalance between the number of alleles found exclusively in the second and third taxa and the number of alleles found exclusively in the first and third taxa. D* employs a rarefaction technique to correct for unequal sample-size and allows multiallelic sites. We use simulations to show that D* has better precision and recall for detecting introgressed segments of DNA when compared to similar methods under a wide variety of model parameters and in the presence of technical artifacts common to ancient DNA analyses. We conclude with an analysis of Denisovan DNA introgression in modern day Papuans. Precompiled executables, the manual, and source code can be found at https://github.com/TQ-Smith/DSTAR

12
Lifestyles of Gypsy-family transposons shape their regulatory mechanisms

Papameletiou, A.-M.; Czech Nicholson, B.; Bornelöv, S.; Hannon, G. J.

2026-05-21 genomics 10.64898/2026.05.19.726053 medRxiv
Top 0.3%
12.3%
Show abstract

Transposable elements are a highly diverse group of selfish genomic elements, prevalent across the tree of life, whose uncontrolled propagation poses a threat to genome stability. Recent studies have explored the evolution of Drosophila melanogaster transposable elements, their co-evolution with the host genome, and mechanisms that regulate their activity. However, little is known about their cross-species evolutionary patterns. Long terminal repeat (LTR) retrotransposons are the most active group of transposable elements in Drosophila. They are broadly separated into retroelements, which are active in the germline, and insect endogenous retroviruses that are active in the soma. Somatic elements are hypothesised to infect the germline through their acquisition of virus-derived proteins such as Envelope and sORF2, thus multiplying through successive generations. In this study, we curated the sequences of LTR retrotransposons in 249 drosophilid genomes, allowing us to study their evolution across these species and highlight their varying degrees of conservation. Furthermore, we reveal multiple instances of Envelope protein loss or inactivation that suggest shifts in the expression pattern of these transposons, likely accompanied by adopting different transcriptional control mechanisms. We contrast this with the evolutionary history of sORF2, which we found to be much more stable. Lastly, we examined variations in transposon LTR regions responsible for transcriptional regulation and use predictive modelling to suggest six transcription factors likely involved in their tissue-specific expression. Altogether, we reveal complex, interspecies evolutionary patterns of Gypsy-family LTR retrotransposons and highlight examples of their co-evolution with their host genome.

13
The Culicinae are Monophyletic and Ancient: A response to Pierce et al. 2025

Soghigian, J.; Morinaga, G.; Yeo, H.; Wilkerson, R.; Linton, Y.-M.; Sallum, M. A.; Sharakov, I.; Sharakova, M.; Laurito, M.; Bang, W. J.; Shin, S.; Snyman, L.; Zavortink, T.; Sither, C.; Reiskind, M.; Wiegmann, B.

2026-05-06 evolutionary biology 10.64898/2026.05.04.720205 medRxiv
Top 0.3%
12.3%
Show abstract

Mosquitoes are classified into two subfamilies, each monophyletic, and typically considered to both be ancient, having diverged more than 100 million years ago based on previous divergence analyses. A recent publication challenged this view with phylogenomic results primarily from the third codon position and UCEs. Utilizing alternative fossil placement and these phylogenomic data, these authors find that the Culicidae and Chaoboridae diverged in the lower Cretaceous, and that one mosquito subfamily, the Anophelinae, is nested within the Culicinae. These results are in stark contrast to previous results from diverse data sources, ranging from other genomic data, to morphology, to fossils. Here, we briefly detail the substantial evidence that supports two monophyletic subfamilies of extant mosquitoes, along with fossil evidence that supports the ancient divergence of these lineages.

14
South Asian Maternal Lineage haplogroup R30 Provides Phylogenetic Evidence of human dispersal across South Asia

Desai, S.; Adhikary, V.; Bhattacharyya, M.; Tharu, M. K.; Sharma, A.; Sequeira, J. J.; Pandey, R. k.; Pandey, P.; Shendre, S. S.; Tayyeh, A. M.; S, S. L.; Mustak, M. S.; Petraglia, M.; Chaubey, G.

2026-05-04 evolutionary biology 10.64898/2026.04.29.721543 medRxiv
Top 0.3%
11.9%
Show abstract

South Asia is central to debates on early human dispersals, particularly the Out of Africa model and Eurasian colonization. Studies of M haplogroups have been used to support both Northern and Southern route hypotheses, but current archaeological and genetic evidence in the region remains contradictory. In the present work, we find that in addition to haplogroup M lineages, a few R lineages exhibit ancient, locally rooted variation, with R30 being one of the widespread haplogroup of R lineages across South Asia. To better understand South Asian demographic history, we investigated the phylogeographic distribution of haplogroup R30, an indigenous lineage. We used 190 complete modern and ancient sequences from diverse mainland and island populations including incorporation of 44 newly generated sequences which enabled the refinement of the R30 phylogeny and the identification of a novel basal lineage, R30c. Bayesian and {rho}-based age estimates suggest that R30 originated in the Indian subcontinent ~50 kya. Early diversification likely occurred in Northern India, giving rise to R30b (~44 kya), while R30a and R30c differentiated primarily in Southern India. Several subclades of haplogroup R30 exhibit strong signatures of founder effects, particularly among the language isolate Vedda of Sri Lanka, Uru Kurumban of Southern India, and the populations of the Lakshadweep archipelago. Bayesian skyline analyses indicate long-term demographic stability followed by rapid lineage expansion ~20 kya and more recent declines consistent with localised drift and relatively recent founder events. The presence of early-diverging R30 lineages in Thailand and Indonesia further supports long-term connections between South and Southeast Asia. Overall, archaeological and genetic evidence point towards the multiple migrations for South Asia colonizations.

15
Diversity and divergence of two sympatric, sibling octopus species

Coffing, G.; Tittes, S.; Small, S. T.; Kern, A. D.

2026-05-04 evolutionary biology 10.64898/2026.04.30.721928 medRxiv
Top 0.3%
11.7%
Show abstract

Coleoid cephalopods have convergently evolved many traits shared with vertebrates, including camera-type eyes, large brain-to-body size ratios, and complex behaviors. Most evolutionary studies of cephalopods have compared individual genomes of taxa that diverged tens to hundreds of millions of years ago, yet very few have examined more recent evolution from a population genetics perspective. Here we present a comparative population genomic analysis of the sympatric sister species Octopus bimaculatus and Octopus bimaculoides using whole-genome resequencing. Despite similar morphologies, these species differ substantially in their life histories, ecologies, and geographic distributions. Using demographic inference, we estimated that the two species diverged approximately one million years ago and that O. bimaculatus has maintained a consistently larger effective population size since divergence. Consistent with these demographic histories, we found stronger signatures of positive selection in O. bimaculatus, including a positive correlation between recombination rate and nucleotide diversity, more selective sweeps, and a higher proportion of mutations fixed by adaptation--all consistent with more efficient natural selection in larger populations. Protein-coding genes overlapping with selective sweeps were enriched for various functions that included many related to brain and eye development, suggesting that traits characteristic of coleoid cephalopods continue to be shaped by positive selection on recent timescales in these species. Comparing coding-sequence divergence on the Z chromosome to the autosomes, we also find evidence for a female-biased mutation rate, consistent with an independent estimate from a deeper-timescale cephalopod comparison.

16
Evidence for independent retroviral syncytin-like Env endogenization in non-placental chondrichthyans

Proudley, E.; Reddin, I. G.; Cleal, J. K.; Lewis, R. M.; Laundon, D.

2026-05-07 evolutionary biology 10.64898/2026.05.06.723177 medRxiv
Top 0.4%
9.9%
Show abstract

Viviparity and placentation are remarkable examples of convergent evolution across vertebrates. The evolution of the uniquely intimate mammalian placenta has been associated with the repeated independent capture of fusogenic retroviral Env proteins, called syncytins. Research into syncytin capture has therefore been predominantly focused on resolving their central role in mammalian placentation. As such, the presence of syncytin-like Env proteins outside of mammals, and their role in non-placental physiological contexts, remain much less understood. We expanded this understanding by systematically surveying genomes from 36 chondrichthyan species (sharks, rays, skates, and chimaeras), which display a wide range of independently evolved placental and non-placental reproductive strategies, for the presence of syncytin-like Env genes. We identified 295 candidate syncytin-like Env proteins from 16 chondrichthyan species, with a subset displaying conserved fusogenic domains, structural homology with known syncytins, and genomic signatures of endogenization. Using transcriptomic data from the model catshark Scyliorhinus canicula, we found that syncytin-like Env genes are transcriptionally active in diverse adult tissue types. Using two closely related species of Squalus (spiny dogfish), we present evidence that endogenized Env genes are syntenically conserved, indicative of vertical transmission from a common ancestor before species divergence. Notably, we detected no candidates in any placental shark genome, suggesting that syncytin-like Env capture is not a feature of shark placentation. Our findings expand the known phylogenetic breadth and functional scope of syncytin-like Env protein endogenization beyond mammalian placentation, providing a solid foundation for future investigations into the wider role of retroviral capture in vertebrate biology and evolution.

17
Genome-wide associations of host susceptibility to helminth and blood pathogens in spatially structured rodent populations

Olarewaju, A. E.; Bryk, J.; Ayansola, V. I.; Dunn, A.; Rybinska, A.; Kloch, A.

2026-05-21 evolutionary biology 10.64898/2026.05.19.726205 medRxiv
Top 0.4%
9.9%
Show abstract

Parasites are ubiquitous drivers of host evolution by exerting strong selective pressure on natural populations. Understanding the genetic basis of host susceptibility to infection is important to know how host-pathogen interactions shape patterns of resistance and diversity in natural populations. We conducted a genome-wide association study (GWAS) to identify host genetic variants associated with infection by helminth and blood pathogens in spatially structured populations of Bank voles (Myodes glareolus; (Schreber, 1780). We genotyped 182 individuals sampled from ten sites in central Europe using quaddRAD sequencing, retaining 30,206 high-quality single-nucleotide polymorphisms (SNPs). Associations between SNP genotypes and parasite infection status were tested using mixed models controlling for relatedness, with host body mass included as a covariate. Across parasite taxa, we identified twelve SNPs exceeding genome-wide significance with the strongest signals detected for the intestinal nematode Heligmosomum mixtum. The variants identified are all intergenic, intronic, upstream or downstream of genes, with none predicted to alter coding sequences. These genes are not classical immunity genes but some are implicated in cytokine production, PI3K/AKT signalling and p38 MAPK pathway, suggesting that selective pressure from pathogens does not only act on known immunity genes, but on broader regulatory and metabolic networks. This finding suggests that variation in gene expression may be important for the differences in host susceptibility or resistance to parasitic infections.

18
Haplotype-based models improve sweep detection in ancient populations with complex demography

Sequeira, A. N.; Szpiech, Z. A.; Huber, C. D.

2026-05-11 genetics 10.64898/2026.05.08.723766 medRxiv
Top 0.4%
9.8%
Show abstract

Identifying signatures of positive selection in humans is complicated by demographic processes such as bottlenecks, migration and admixture, all of which can distort or obscure the genomic patterns produced by selective sweeps. Ancient DNA offers a direct window into past allele and haplotype frequencies, yet most sweep scans in ancient populations rely on allele-frequency or site frequency spectrum (SFS) summaries, with limited use of haplotype-based approaches. Here, we evaluate the performance of haplotype and SFS-based methods for detecting selective sweeps under demographic scenarios that reflect the complex history of ancient and modern Europeans. We extend the haplotype-based likelihood framework saltiLASSI to accommodate pseudohaploid ancient genomes, enabling the use of truncated haplotype frequency spectra and their spatial decay to detect sweeps without requiring phased data. Using forward-in-time simulations, we examine sweeps of varying ages, two pulses of admixture with different source proportions, and cases where selection continues or ceases after admixture. We compare saltiLASSI to a widely used SFS-based approach (SweepFinder2). Our results show that haplotype-based likelihood models retain higher power than SFS methods in admixed populations, particularly when sweep haplotypes are introduced through migration or when selection has not had sufficient time to regenerate a clear SFS signature after admixture. These findings highlight the promise of haplotype-based inference for ancient DNA and demonstrate how model-based approaches can improve the detection of historical selective sweeps in populations with complex demographic histories.

19
Origins of eukaryotic metabolism

Santana-Molina, C.; Spang, A.; Snel, B.

2026-05-12 evolutionary biology 10.64898/2026.05.08.723234 medRxiv
Top 0.5%
9.2%
Show abstract

The origin of eukaryotes is a key event in the evolution of cellular life hypothesized to involve a symbiotic integration between a member of the Asgard archaea and the Alphaproteobacteria. Recent work has provided evidence for additional genetic input from other prokaryotes to the eukaryotic proteome yet the extent and sources of these contributions remain debated. Here we aimed to further resolve the prokaryotic origins of eukaryotic genes to inform our understanding of eukaryogenesis. Specifically, we developed a phylogenetic framework to investigate the origins of eukaryotic gene families associated with metabolism and informational processing for comparison. We found that informational processing genes were predominantly derived by archaea whereas eukaryotic metabolism is highly chimeric in its origin. In contrast to previous studies, we report a substantial number of archaeal origins of diverse metabolic enzymes including key metabolic regulators. This highlights an overlooked participation of archaeal metabolism and pinpoints potential metabolic integrations during eukaryogenesis. Apart from the alphaproteobacterial contributions to the eukaryotic metabolism, we found an additional dominant phylogenetic signal of genes potentially derived from Myxococcota, especially for gene families associated with lipid metabolism. By systematically analysing the origins of eukaryotic metabolism, this research offers novel insights into the origin of eukaryotic membranes and refine our current models for the origin of the eukaryotic cell.

20
Population genomics of nicotinic acetylcholine receptors in Anopheles funestus reveals rapid evolution of the α9 and β2 subunits within a constrained gene family

Rios, D.; Fouet, C.; Kamdem, C.

2026-05-16 evolutionary biology 10.64898/2026.05.15.725454 medRxiv
Top 0.5%
9.1%
Show abstract

The deployment of clothianidin-based insecticide formulations in malaria vector control has highlighted the capacity of Anopheles funestus to displace more susceptible mosquito species in treated areas and to rapidly evolve resistance under selection pressure. Metabolic detoxification, together with structural and genetic changes in nicotinic acetylcholine receptors (nAChRs), the primary molecular targets of neonicotinoids, can reduce insecticide efficacy. Here, we characterized amino acid substitutions across all 11 nAChR subunits in An. funestus to assess standing variation that may facilitate adaptive responses to chemical exposure. Using whole-genome sequencing data from 656 mosquitoes sampled in 13 African countries, we found marked contrasts in the distribution of nonsynonymous variants among nAChR subunits. Most subunits are strongly constrained and carry no missense variants, whereas two loci (3 and 7) display three geographically widespread amino acid substitutions across the continent. In contrast, 9 and {beta}2 accumulate dozens of nonsynonymous mutations occurring at intermediate to high frequencies, including within domains involved in orthosteric ligand binding and channel gating. Genetic differentiation at nAChR loci among populations from different countries is low to moderate, although several nonsynonymous mutations display high FST values consistent with geographic structuring. These results highlight relaxed constraint on two subunits that may provide opportunities for evolutionary diversification within a conserved family of multimeric receptor assemblies. Such diversification has not been observed in vector species displaced by An. funestus in indoor residual spraying areas, and the potential implications for reduced sensitivity to neonicotinoids are discussed.